🤖 Inject mode transition context for flaky test #224 #225

ammar-agent · 2025-10-13T18:01:21Z

Problem

Integration test should include mode-specific instructions in system message is flaky (#224). When testing mode switches mid-conversation, models sometimes respond with the wrong mode marker ([PLAN_MODE_ACTIVE] instead of [EXEC_MODE_ACTIVE]).

Root cause: Models see conflicting signals:

System message says "You're in EXEC mode"
Conversation history shows they just said they were in PLAN mode
User prompt is vague ("Please respond.") with no transition signal

Models sometimes prioritize conversation consistency over system instructions.

Solution

Inject mode transition as a temporal user message in the conversation flow.

When mode changes, insert a synthetic user message before the final user message:

[Mode switched from plan to exec. Follow exec mode instructions.]

Benefits:

✅ Temporal - transition happens in natural conversation flow
✅ Models handle in-message context better than system changes
✅ Simple - no metadata persistence complexity
✅ Works for both tests and production usage

Implementation

Added injectModeTransition() to modelMessageTransform.ts
Operates on CmuxMessage[] where metadata.mode is available
Called after addInterruptedSentinel, before converting to ModelMessage
Mode persisted in assistant message metadata for next request
Added comprehensive unit tests (5 test cases)

Testing

✅ All unit tests pass
✅ CI will verify improved integration test reliability
✅ Works for real-world mode switches mid-conversation

Closes #224

- Add 'mode' field to CmuxMetadata to track mode per message - Detect mode switches by comparing with last assistant message - Inject explicit transition note when mode changes mid-conversation - Helps models understand they should follow new mode instructions Addresses #224 - flaky mode-specific instructions test

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

src/services/aiService.ts

Changed approach: inject mode switch as synthetic user message instead of system instruction enhancement. Benefits: - More temporal - transitions happen in conversation flow - Models handle in-message context better than system changes - Simpler - no need to persist mode across stream lifecycle - Avoids metadata persistence issue caught by Codex The synthetic message is inserted before the last user message when mode changes, providing natural context for the model. Implementation: - Added injectModeTransition() to modelMessageTransform.ts - Operates on CmuxMessage[] where metadata is available - Inserts synthetic user message: '[Mode switched from X to Y]' - Called after addInterruptedSentinel, before conversion to ModelMessage Co-authored-by: Codex (review feedback)

- Added 5 unit tests for injectModeTransition() - Fixed edge case: don't inject when no user messages exist - Pass mode to StreamManager so it persists in final history - Updated PR description to be clearer and more concise Co-authored-by: Codex (persistence fix)

ammario

I've verified conversationally much better agent knowledge of modes with this fix

chatgpt-codex-connector bot reviewed Oct 13, 2025

View reviewed changes

src/services/aiService.ts Show resolved Hide resolved

ammar-agent added 3 commits October 13, 2025 13:07

🤖 Fix formatting

cbfc8a8

ammario approved these changes Oct 13, 2025

View reviewed changes

ammario merged commit fa61ff6 into main Oct 13, 2025
7 checks passed

ammario deleted the flakes branch October 13, 2025 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 Inject mode transition context for flaky test #224 #225

🤖 Inject mode transition context for flaky test #224 #225

Uh oh!

ammar-agent commented Oct 13, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

ammario left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🤖 Inject mode transition context for flaky test #224 #225

🤖 Inject mode transition context for flaky test #224 #225

Uh oh!

Conversation

ammar-agent commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Implementation

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammario left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ammar-agent commented Oct 13, 2025 •

edited

Loading